Distinctions between Choroidal Neovascularization and Age Macular Degeneration in Ocular Disease Predictions via Multi-Size Kernels ξcho-Weighted Median Patterns

Age-related macular degeneration is a visual disorder caused by abnormalities in a part of the eye’s retina and is a leading source of blindness. The correct detection, precise location, classification, and diagnosis of choroidal neovascularization (CNV) may be challenging if the lesion is small or if Optical Coherence Tomography (OCT) images are degraded by projection and motion. This paper aims to develop an automated quantification and classification system for CNV in neovascular age-related macular degeneration using OCT angiography images. OCT angiography is a non-invasive imaging tool that visualizes retinal and choroidal physiological and pathological vascularization. The presented system is based on new retinal layers in the OCT image-specific macular diseases feature extractor, including Multi-Size Kernels ξcho-Weighted Median Patterns (MSKξMP). Computer simulations show that the proposed method: (i) outperforms current state-of-the-art methods, including deep learning techniques; and (ii) achieves an overall accuracy of 99% using ten-fold cross-validation on the Duke University dataset and over 96% on the noisy Noor Eye Hospital dataset. In addition, MSKξMP performs well in binary eye disease classifications and is more accurate than recent works in image texture descriptors.


Introduction
Age-related macular degeneration is a visual disorder caused by abnormalities in a part of the eye's retina and is a leading source of visual impairment Ref. [1]. Therefore, early diagnosis and treatment is critical Ref. [2]. Recently, retinal optical coherence tomography (OCT) images have been used to attain information regarding the health of the posterior eye (e.g., the retina and choroid) Ref. [3]. OCT is a quick, non-invasive medical imaging tool that uses low coherence interferometry to produce cross-sectional images of the retina and optic nerve head (ONH), or the most anterior part of the visual pathway, from the retina to lamina the cribrosa, which assess visual disorders, such as optic nerve disease, qualitative and quantitative. High-resolution (in µm range) scanner techniques such as optical coherence tomography (OCT) produce three-dimensional cross-sectional images of the eye's biological tissues to visualize the individual layers of the posterior segment of the eye, allowing the diagnosis and monitoring of ocular diseases and anomalies Ref. [4]. This development of image acquisition reduces the cost of storage, which allows ophthalmologists to utilize these images to diagnose various eye diseases Refs. [5,6]. However, ophthalmologists would manually interpret each OCT image in the volumes to make a diagnosis decision. The increased data makes manual interpretation of the OCT volumes time-consuming Ref. [7].  [8], top: CNV class, bottom: Drusen class; (b) Images taken from [9], top: DME class, bottom: Normal class.
The correct detection, precise location, classification, and diagnosis of choroidal neovascularization (CNV) may also be challenging if the lesion is small or if OCT images are degraded by speckle noises. Speckle noises in OCT images must be addressed to achieve good classification performance. Speckle noise, called multiplicative noise, is a granular noise texture. It degrades image quality as a consequence of interference among wavefronts in coherent imaging systems, such as radar, laser imaging, medical ultrasound, and optical coherence tomography. The speckle noise is signal-dependent and governed by the Fisher-Tippett distribution. Mathematically, Speckle noise Refs. [10][11][12][13] is expressed as u(x,y) = v(x,y) + v(x,y)η(x,y), where (x,y) are pixel position, v(x,y) is the clean image, u(x,y) is the noisy image, η is a Gaussian noise distribution with zero-mean and some variance σ 2 . It is well known that smoothing filters can reduce speckle noise. These techniques convolve an image with various neighbor sizes, 3 × 3, 5 × 5, etc., using a weighted average or selecting a median value, i.e., a non-weighted median filter. However, selecting a median value without any weights from a high noise-density image is not practical because a noise pixel may be selected instead of an image pixel. Therefore, we propose a new texture descriptor, MSKξMP, which echoes or repeats pixel values in a kernel to encode OCT images while avoiding a high level of speckle noises. This paper makes the following contributions: Figure 1. (a) Images taken from [8], top: CNV class, bottom: Drusen class; (b) Images taken from [9], top: DME class, bottom: Normal class.
The correct detection, precise location, classification, and diagnosis of choroidal neovascularization (CNV) may also be challenging if the lesion is small or if OCT images are degraded by speckle noises. Speckle noises in OCT images must be addressed to achieve good classification performance. Speckle noise, called multiplicative noise, is a granular noise texture. It degrades image quality as a consequence of interference among wavefronts in coherent imaging systems, such as radar, laser imaging, medical ultrasound, and optical coherence tomography. The speckle noise is signal-dependent and governed by the Fisher-Tippett distribution. Mathematically, Speckle noise Refs. [10][11][12][13] is expressed as u(x,y) = v(x,y) + v(x,y)η(x,y), where (x,y) are pixel position, v(x,y) is the clean image, u(x,y) is the noisy image, η is a Gaussian noise distribution with zero-mean and some variance σ 2 . It is well known that smoothing filters can reduce speckle noise. These techniques convolve an image with various neighbor sizes, 3 × 3, 5 × 5, etc., using a weighted average or selecting a median value, i.e., a non-weighted median filter. However, selecting a median value without any weights from a high noise-density image is not practical because a noise pixel may be selected instead of an image pixel. Therefore, we propose a new texture descriptor, MSKξMP, which echoes or repeats pixel values in a kernel to encode OCT images while avoiding a high level of speckle noises. This paper makes the following contributions: This paper has the following structure: the next section contains a Literature Review that offers an overview of methods that give detailed descriptions of the MSKξMP algorithm; the computer simulations section gives numerous experiments showing the performance of MSKξMP; finally, the conclusion and future work are presented in the last section.

Literature Review
In recent years, deep learning networks can perform vision recognition in various applications such as self-driving cars, natural language and image processing, and medical diagnosis (e.g., ocular diseases). Ref. [7] proposes an algorithm imitating how ophthalmologists form diagnoses by focusing on the information provided by OCT images during the classification process. This process is called a B-scan attentive convolutional neural network (BACNN). A self-attention module is used to cumulate features based on their clinical importance to obtain feature vectors. Ref. [1] proposes a multipath CNN architecture for the diagnosis of AMD. This architecture has five convolutional layers to classify AMD or normal images.
The multipath convolution layers extract critical global structures with a large filter kernel and use a sigmoid function as the classifier. Ref. [14] presented a CNN based on surrogate-assisted classification that classifies retinal OCT images automatically. Initial preprocessing was performed on each image using image denoising, thresholding, and morphological dilation to locate binary masks of the retina regions. The preprocessed images were employed to produce surrogate images in image augmentation, which were used to train the CNN model.
Several hand-crafted feature techniques show good classification performance on ocular disorder classifications in OCT images. Ref. [4]'s algorithm extracts features using multiscale histograms of oriented gradient descriptors and are classified using a support vector machine. Ref. [15] proposes the automated detection of AMD and DME from retina OCT images based on sparse coding, spatial pyramids, global representations, and dictionary learning. This process is coupled with preprocessing and a support vector machine classifier. Ref. [6] proposed a multiclass model for detecting AMD, DME, and normal using linear configuration patterns (LCPs) to extract pyramid and multiscale features. Recently, [10] proposed a texture descriptor called Alpha Mean Trimmed local binary patterns (AMT-LBP) based on Alpha Mean Trimmed Filter. The AMT-LBP encodes image pixels while partially avoiding and exploiting speckle noises. Table 1 summarizes the rest of the other recent techniques in ocular disorder classification.
The current works using hand-crafted features fall short compared to the deep learning technique in achieving extremely high classification accuracies. However, deep-learning models usually require long training times and heavy computational hardware such as GPUs. They are also data-hungry and complex. A simple, aggressive machine-learning approach is proposed here to overcome these shortcomings. This technique contains a smaller number of weights, making them easier to implement, requires less training time, and does not require specialized hardware Ref. [19]. Textural information is vital in analyzing eye diseases. Thus, this paper presents a novel texture descriptor, Multi-Size Kernel ξcho-Weighted Median Patterns, to distinguish between the various ocular diseases while achieving very high classification accuracies.

Methods
Our algorithm has the following steps.

Preprocessing
OCT scanner misalignment between the eye and sensor during the acquisition of the retinal images cause white areas in the image. Therefore, preprocessing Ref. [10] starts with removing white areas by assigning white pixels to black pixels. Then, the image is resized from a square image to a rectangular 256 × 512 image. Next, the image is binarized using Otsu thresholding Ref. [20], which generates a binary image (black and white), and a non-weighted median filter is applied afterward. The purpose of this median filter is not to remove noise in the image but to remove white areas in the binary image outside the retinal regions. A morphological dilated operator with a large structuring element is used on the median filtered image to close fluid region structures in DME and CNV images. Finally, the retinal layers are flattened by applying the polynomial fit of either the 2nd order or 3rd order based on the R 2 fitness function. In most cases, the 3rd-order polynomial is selected to flatten the retinal layers.

Hand-Crafted Features
This section presents new, so-called Multi-Size Kernels ξcho-Weighted Median Patterns (MSKξMP), hand-crafted features. This is an extension of the Local Binary Patterns (LBP) feature extractor [21] and is used in many applications such as texture analysis, face recognition, object detection, fault diagnosis, and image retrieval. The advantages of LBP are that it is computationally simple, efficient, and is invariant to illumination. However, the disadvantages of LBP are its production of artifacts and noises, which distort the central pixel value causing classification degradation. To improve the robustness of LBP, different modifications of LBP were proposed.
In this proposal, we start with the generalized form of the LBP given by the following: where I C is the center pixel, I i are the surrounding pixels, f (i) is equal to 2 i and R is a region defined by the kernel size. The kernel size is usually selected to be 3 × 3; however, sizes such as 5 × 5 and 7 × 7 are also known to be used. s( where T is selected to be zero. The generalized LBP operates on all the pixels in an image using a specific kernel size. Each of these kernels is placed over a pixel, I C , and is compared to its surrounding neighboring pixels I i using Equation (1). If a neighbor pixel is greater than or equal to the center pixel, then s(·) is assigned 1; if a neighbor pixel is less than the center pixel, then s(·) is assigned 0. A binary sequence is obtained, and each sequence is assigned to the appropriate decimal weight, 2 i , which is then converted into a decimal value. The decimal weights, 2 i , are then summed and encoded to structural information, generating an LBP image. The LBP is simple, fast, easy, and robust to illumination changes. However, it suffers when speckle noises are present. The median LBP Ref. [22] was proposed to address these noises. The median LBP can be implemented by replacing I C in (1) with the median value of all the pixels within the region R: I C = I median = median(R) = median(I 0 , I 1 , I 2 , . . . I N−1 ), where N is the number of pixels in region R. However, the current median LBP does not go far enough to prevent speckle noises from falling into the texture-based calculation. For example, Figure 2a  3 × 3 kernel with the following values [135, 75, 75; 135, 75, 75; 160, 135, 135]. The 75's are image pixels without noise and the 135's and 160's are noisy pixels. When we select the median value within the 3 × 3 kernel, we obtain 135, a noisy pixel. To overcome this deficiency, Multi-Size Kernels ξcho-Weighted Median Patterns are proposed. Figure 2. (a) A noisy DME OCT image taken from [9] with a 3 × 3 kernel of clean and noisy pixels; median is calculated; (b) Finding the median using echo-based weights; (c) A sample calculation of medianLBP5 × 5.

Multi-Size Kernels Echo-Weighted Median Patterns
MSKξMP is a textural feature descriptor similar to median LBP. MSKξMP encodes the textural pattern information by comparing surrounding pixels to the median pixel Imedian. However, this median value is calculated differently than Ref. [22]. Its surrounding pixels are defined using six different kernel sizes represented using Rn×m, where n's and m's have the following values: 3 × 3, 5 × 5, 7 × 7, 3 × 5, 5 × 7, and 3 × 7. Numerous kernel sizes are used because each kernel captures a variation of texture within the OCT; one kernel can miss a specific feature that another kernel size picks up. A larger kernel like the 7 × 7 can capture regional details that a 3 × 3 kernel can miss and vice versa.
The MSKξMP is calculated in three steps for each kernel size, Rn×m. The first is to calculate Imedian, the second is to find each mi's found in Rn×m, and the third step is to calculate each medianLBPn×m using parameters obtained from steps one and two. The medi-anLBPn×m is defined by the following equation: where IC is the current pixel being encoded and is the center pixel of the kernel. The combination of all medianLBPn×m are the MSKξMP. The calculation of Imedian is motivated by the weighted median filter in Ref. [23], which echoes or repeats pixel values a predetermined number of times. Figure 2b illustrates the 3 × 3 kernel, but now echo-weights are applied. The integer of each weight indicates the number of repetitions, e.g., the second element, 75, is repeated two times, or the middle element, 75, is repeated three times. After

Multi-Size Kernels Echo-Weighted Median Patterns
MSKξMP is a textural feature descriptor similar to median LBP. MSKξMP encodes the textural pattern information by comparing surrounding pixels to the median pixel I median . However, this median value is calculated differently than Ref. [22]. Its surrounding pixels are defined using six different kernel sizes represented using R n×m , where n's and m's have the following values: 3 × 3, 5 × 5, 7 × 7, 3 × 5, 5 × 7, and 3 × 7. Numerous kernel sizes are used because each kernel captures a variation of texture within the OCT; one kernel can miss a specific feature that another kernel size picks up. A larger kernel like the 7 × 7 can capture regional details that a 3 × 3 kernel can miss and vice versa.
The MSKξMP is calculated in three steps for each kernel size, R n×m . The first is to calculate I median , the second is to find each m i 's found in R n×m , and the third step is to calculate each medianLBP n×m using parameters obtained from steps one and two. The medianLBP n×m is defined by the following equation: where I C is the current pixel being encoded and is the center pixel of the kernel. The combination of all medianLBP n×m are the MSKξMP. The calculation of I median is motivated by the weighted median filter in Ref. [23], which echoes or repeats pixel values a predetermined number of times. Figure 2b illustrates the 3 × 3 kernel, but now echo-weights are applied. The integer of each weight indicates the number of repetitions, e.g., the second element, 75, is repeated two times, or the middle element, 75, is repeated three times. After applying the echo weight to the 3 × 3 kernel, the median was determined to be 75 or I median = 75, which is not a noise pixel. These weights can be determined by centering a two-dimensional Gaussian function onto I C , defined by the following: where σ x and σ y are standard deviations in the x and y directions, respectively. When a square kernel is utilized, the standard deviations should be the same, σ x = σ y , and when a rectangular kernel is utilized, the standard deviations could be different, σ x = σ y . The selection of σ x and σ y is flexible, as long as the center pixel has the most weight and the surrounding weights are not too small compared to the center. Please see Figure 3, "MSKξMP Weights". Notice the center weight is only one value higher than its neighboring values. The Gaussian Function is then normalized by its minimum value, min(G(x,y)), to ensure its minimum value equals 1. We then took the ceiling of each element in the normalized Gaussian Function to obtain integer values (or repetition values). The Echo formula can be represented by the following: where · indicates the ceiling operator, and ξ(x,y) represents the echo weights of our kernels. The standard deviations utilized in the 3 × 3 weight kernel, shown in Figure 3, are σ x = σ y = 1.2. The values of the standard deviations should increase with increasing kernel size. The weight of the center pixel should always have the highest repetition because this gives a non-noise pixel a better chance of being selected as the median. Figure 3 shows MSKξMP images associated with their echo weights for all the kernel sizes used in this paper. Notice that the MSKξMP images with weights 5 × 7 and 7 × 7 had the coarsest texture, meaning there were more texture variations and they would provide the highest performance. For example, the MSKξMP of the CNV images can extract drusen particles better than the Fibonacci patterns. The DME image had the most noise, however, the 3 × 3 weighted MSKξMP image could encode less noise compared to the classical LBP.
Finally, I median can be written in terms the pixel intensities in R n×m : where is a duplication operator, ε 0 , ε 1 , . . Finally, Imedian can be written in terms the pixel intensities in Rn×m: where △ is a duplication operator,  Once all the mi's are calculated, we moved to the third step, where we compared mi's to Imedian found in the first step using (2). The comparison encoded the mi's to binaries which were then multiplied by their respective decimal weights, and the sum was calculated to obtain medianLBP5 × 5(IC). The rest of the medianLBPn×m's are obtained similarly. Figure 3 shows that each medianLBPn×m image displays different textural information, which is beneficial in distinguishing the different types of ocular disorders.
Four histograms were generated for each medianLBPn×m image to extract their features. One histogram was generated from the entire medianLBPn×m image, which represented the global features. The other three histograms were generated from local overlapping regions and labeled with different color brackets, as seen in Figure 4. Each local regions had a size of 256 × 256 to capture local features. The four histograms from each medianLBPn×m are concatenated to form the MSKξMP feature vector for each OCT image.

Datasets
Three datasets were used to test the performance of MSKξMP. The first dataset, designated as Dataset 1, was taken from Duke University, Harvard University, and the Uni- Once all the m i 's are calculated, we moved to the third step, where we compared m i 's to I median found in the first step using (2). The comparison encoded the m i 's to binaries which were then multiplied by their respective decimal weights, and the sum was calculated to obtain medianLBP5 × 5(I C ). The rest of the medianLBP n×m 's are obtained similarly. Figure 3 shows that each medianLBP n×m image displays different textural information, which is beneficial in distinguishing the different types of ocular disorders.
Four histograms were generated for each medianLBP n×m image to extract their features. One histogram was generated from the entire medianLBP n×m image, which represented the global features. The other three histograms were generated from local overlapping regions and labeled with different color brackets, as seen in Figure 4. Each local regions had a size of 256 × 256 to capture local features. The four histograms from each medianLBP n×m are concatenated to form the MSKξMP feature vector for each OCT image.

Datasets
Three datasets were used to test the performance of MSKξMP. The first dataset, designated as Dataset 1, was taken from Duke University, Harvard University, and the University of Michigan Ref. [4]. This dataset consisted of SD-OCT volumetric scans acquired from 45 patients: 15 normal patients, 15 patients with dry AMD, and 15 patients with DME. All SD-OCT volumes were acquired in Institutional Review Board-approved protocols using Spectralis SD-OCT (Heidelberg Engineering Inc., Heidelberg, Germany). The second dataset, which will be designated as Dataset 2, was acquired from the Noor Eye Hospital dataset Ref. [9] and consisted of 148 SD-OCT volumes (48 AMD, 50 DME, and 50 Normal), acquired by using the Heidelberg SD-OCT imaging system at Noor Eye Hospital in Tehran (NEH). Each volume consisted of 19 to 61 B-scans; the resolution of the B-scans was 3.5 µm, and the scan dimension was 8.9 × 7.4 mm 2 .
The third dataset, Dataset 3, was also collected by the Heidelberg SD-OCT imaging system at Noor Eye Hospital (NEH) and was obtained from the Mendeley database website Ref. [8]. It contained 16,822 OCTs images of Normal (120 volumes), Drusen (160 volumes), and CNV (161 volumes). However, 12,641 images (3234 CNV, 3740 Drusen, and 5667 Normals) were selected for our experiments. This is because we were only keeping worst-case condition images for each volume, whereas if a patient was labeled a CNV case, only CNV B-scans within that volume were included for training and testing procedures. Normal and drusen B-scans of that patient were excluded. Note that drusen particles are early signs of aged macular degeneration, which can be treated in the same class as AMD.

Feature Weightings, Selection, and Classification
Principal Component Analysis (PCA) is one of the oldest and most widely used statistical techniques that reduces feature dimensionalities. PCA improves interpretability while preserving variability, which minimizes informational loss. This means finding new variables F i that are linear functions of those in the original features, while successively maximizing variance and maintaining uncorrelatability. This allows the representation of a class of features in the following form: where λ i eigenvalues are sorted according to the size as λ 1 ≥ λ 2 ≥ . . . ≥ λ p ≥ 0, F i , i = 1, 2, . . . p, features, F 1 is the best feature and the 2nd, 3rd, . . . , kth (k ≥ 4) are other principal components.
The PCA used in this paper was based on Singular Value Decomposition (SVD), which was used to select our best features. X = UΣV T represents the SVD of an input matrix X, where U represents the reflection of X, the diagonal of Σ represents the variabilities, σ 1 , . . . , σ η , up to the η-th feature, V T is the 'eigen vectors' of X. X will be subtracted by its mean of each column of X mean , to obtain X m = X − X mean . This subtraction is commonly used for normalizing PCA data. We then calculated X PCA , which is defined by the following: where M is selected based on the percentage of variability up to η, η ≥ M, N is the number of total observations in the dataset and the dimensions of X PCA is N rows by M columns. Our feature selection also included feature weighting using neighborhood component analysis (NCA) Ref. [24] to refine our feature searching process. NCA is a feature weighting algorithm utilizing the nearest neighbor to maximize expected leave-one-out classification accuracy. Then NCA was optimized using stochastic gradient accent to learn a feature weighting vector. The output of the NCA is a vector, ω, of length of η representing the weight importance of each feature. Each element in ω is assigned to each column of matrix X PCA .
We then searched for sets of optimal features by reducing the number of columns of X PCA through iterations. The values of the weights in ω were used for removing features from X PCA at iteration. The weights can be written as the following: where M l , is the number of features remaining after some features have been removed at lth iteration. This idea was taken from Ref. [25] "Drop Out", where a Neural Network with a large number of parameters tended to overfit a dataset, causing a decrease in performance. The function of dropout is to drop neuron units with low weights from the neural network during training after multiplying feature weights to feature vectors. Our algorithm utilized this idea to drop features with weights that were below a certain threshold. By dropping these unimportant features, we achieved faster training times and prevented overfitting. To multiply feature weights to feature vectors, ω can be rewritten in matrix form by replicating each element and putting them in column wise fashion. Ω (l) is defined by the following: where Ω has N number of rows. We can rewrite our input classification matrix, X PCA (l) , in terms of Ω and at lth iteration using the following: where x i,PCA (l) is the feature vector in the ith row of X PCA (l) , "·" indicates point matrix multiplication operation, τ ω (l) is the weight removal threshold at lth iteration, and X m,N×M l is the updated feature matrix with M 1 number of features. The features that are dropped have feature weights below τ ω (l) at the l iteration. X PCA (l) is a new classification input matrix and is inputted to each of the classifiers. The classifiers used during each iteration are RUS-Boost Ref. [26], and Naïve Bayes Ref. [27] and support vector machine using Polynomial and Gaussian Ref. [28], Random Forest Ref. [29] and Adaboost Ref. [30]. τ ω (l) values selected for this paper ranged from 0.1 to 1 with increments of 0.01. The classification accuracies, a(l,k), were recorded in each lth iteration at the kth classifier. a(l,k) was defined by the following: a(l, k) = C k X PCA (l) , k = 1, 2, . . . , 6 where C k was one of the kth classifiers listed above. The maximum accuracy was found using the following: where argmax(·) determines the maximum location or accuracy of lth at kth classifier. Dataset 1, 2, and 3 were trained in this fashion. The η was selected to be 100 for Dataset 1, and 70 for Dataset 2 and 3. Figure 4 shows the MSKξMP OCT Image recognition system from start to finish.

Performance Evaluation
To evaluate and determine the performance of the proposed feature extraction approach, the accuracy, sensitivity, specificity, precision, and F1-score were compared with the results of HOG Ref. [4], BACNN Ref. [7], Alpha Mean Trimmed Patterns Ref. [10], Fibonacci Patterns Ref. [19], Classical LBP Ref. [21], Vision Transformer Ref. [31], ResNet Ref. [32], VGG16 Ref. [33], and Inception V3 Ref. [34]. By definition, higher values on these indexes imply better quality measures of classification. Mathematical formulas of these measurements are given below: where Tp and Tn are the true positive and true negative and Fp and Fn are the false positive and false negative, respectively.

Results and Discussions
For datasets 1 and 2, four different classification schemes were tested using MSKξMP, AMD vs. DME vs. Normal, AMD vs. DME, DME vs. Normal, Normal vs. AMD, and all classes. Since dataset 1 was slightly imbalanced, AMD (723 images), DME (1101), and has Normal (1407), using accuracy as the only measurement would not be enough to detail the efficacies of the MSKξMP. Using measurements such as recall, specificity, precision, and F1-score was more effective. For dataset 3, we also tested using four classification schemes: Normal vs. CNV, CNV vs. Drusen, Normal vs. Drusen, and all classes, Normal vs. CNV vs. Drusen. Our experiments included kernel sizes up to 3 × 9 and 5 × 9, however, these window sizes were too large, and important features detected may not be within the local areas of the pixel, I C . Therefore, kernel sizes up to n × 9 were omitted from the result section. As using more kernels with varying sizes would have diminishing returns and performance improvements would be minimal. Using six kernels is the right balance between computation times and performances of our MSKξMP.

Dataset 1, 2 and 3 Results
The highlights of our experiments on datasets 1 and 2 are shown in Table 2 and Figure 5, which show the performance measurements of our MSKξMP. The highest accuracy was achieved by SVM using polynomial kernel at 99.78% on dataset 1 and 96.59% on dataset 2. We achieved 100% on some of the binary classifications, Normal vs. AMD and Normal vs. DME, both using SVM with polynomials as well. Rus-Boost and Naïve Bayes seem to perform the worst, while Adaboost and Random Forest have similar performance to each other. This is because Adaboost and Random Forest both utilize an ensemble of weaker classifiers to create one stronger classifier: K-nearest-neighbor Classifiers to create Adaboost and decision trees for creating Random Forest. The highest accuracy was achieved by SMV polynomial. Regarding misclassifications, SVM polynomial kernel did not misclassify any normal images, only misclassified two DME class images, and using SVM Gaussian only two DME images are misclassified for dataset 1, see confusion matrixes Figure 5. Something to note on dataset 2's results are their sensitivities (recall) and specificity, notice the specificity was 98.33%, which is higher than the accuracy and sensitivity; this suggests high confidence in their true negatives. weaker classifiers to create one stronger classifier: K-nearest-neighbor Classifiers to create Adaboost and decision trees for creating Random Forest. The highest accuracy was achieved by SMV polynomial. Regarding misclassifications, SVM polynomial kernel did not misclassify any normal images, only misclassified two DME class images, and using SVM Gaussian only two DME images are misclassified for dataset 1, see confusion matrixes Figure 5. Something to note on dataset 2's results are their sensitivities (recall) and specificity, notice the specificity was 98.33%, which is higher than the accuracy and sensitivity; this suggests high confidence in their true negatives.  The main takeaway from the results in dataset 3 is that our MSKξMP was able to achieve 89.3% in accuracy using SVM polynomial kernel with specificity of 94.37%. This means high confidences in the true negative diagnoses. The other classifiers performed similarly to dataset 1 and dataset 2. Normal vs. CNV achieved the highest accuracy at 98.53% due to the significant differences between their visual features. It can be observed that MSKξMP was able to achieve 93.69% accuracy with the CNV vs. Drusen classification scheme. This is a good score because while CNV images contain drusen particles that may cause some confusion with AMD images, our feature extractor was able to differentiate between these two classes reliably. Figure 6 shows ROC curves for our best results "SVM: Polynomial" in all three datasets. For each dataset, we plotted three ROC curves with one class being the positive class of the other two classes. For example, dataset 1 have three classes: Normal, DME, and AMD. One curve plots the True Positives vs. the False Positives using Normal as the positive class and DME and AMD as the negative classes. The area under the curves (AUC) are also computed for each respective curve. Notice the AUCs for dataset 1 were higher and the AUCs of dataset 3 were lower, which reflects their respective accuracies, sensitivities, and specificity. cause some confusion with AMD images, our feature extractor was able to differentiate between these two classes reliably. Figure 6 shows ROC curves for our best results "SVM: Polynomial" in all three datasets. For each dataset, we plotted three ROC curves with one class being the positive class of the other two classes. For example, dataset 1 have three classes: Normal, DME, and AMD. One curve plots the True Positives vs. the False Positives using Normal as the positive class and DME and AMD as the negative classes. The area under the curves (AUC) are also computed for each respective curve. Notice the AUCs for dataset 1 were higher and the AUCs of dataset 3 were lower, which reflects their respective accuracies, sensitivities, and specificity.

Ablation Study
For our ablation study, we tested the feature extraction capabilities of each kernel size and weights (shown in Figure 3) on dataset 1. From Table 3, it can be observed that the accuracies measured were highest with the 5 × 7 and 7 × 7 kernels. For each kernel size, the accuracies were about 0.5-3% below our highest achieved accuracy of 99.78% (all ker-

Ablation Study
For our ablation study, we tested the feature extraction capabilities of each kernel size and weights (shown in Figure 3) on dataset 1. From Table 3, it can be observed that the accuracies measured were highest with the 5 × 7 and 7 × 7 kernels. For each kernel size, the accuracies were about 0.5-3% below our highest achieved accuracy of 99.78% (all kernels). Concatenating the feature vectors generated from each kernel size helped improve the performances of our MSKξMP and indicated that each sized kernel captured different textural information from the OCT image, both locally and regionally. Based on this ablation study, we decided to omit the 3 × 3 kernel and test combined features generated by 5 × 5, 7 × 7, 3 × 5, 5 × 7 and 3 × 7. We were able to achieve an accuracy of 99.78% by SVM polynomial kernel without using the 3 × 3 kernel.

Comparisons to Recent State of the Art
Comparing dataset 1 with recent state of the art shows our MSKξMP is able to achieve slightly higher accuracies, as shown in Tables 4 and 5. Das Ref. [7]'s BACNN achieved an accuracy of 97.76% with a plus or minus of 2.24%, meaning BACNN was able to achieve 100% but with extremely high fluctuation. Even though our experiments were based on ten-fold validations, our fluctuation was only about 0.3-0.4%, which is more stable than BACNN. Compared to other state of the arts, our MSKξMP outperformed very deep CNNs such as Refs. [13,32,34] as well as the recent discovery of vision transformer algorithm Ref. [31]. Also, our MSKξMP was able to outperform recent state of the art texture descriptors such as Refs. [10,19], by 99.78% to 94.96% and 99.16% respectively, see Table 6. Table 7 shows datasets 2 and 3 comparisons between the recent state of the art and our MSKξMP. Our accuracies either outperformed or were comparable against deep learning techniques such as Ref. [35]. Note that Ref. [35] had high fluctuations across all performance measurements, ±0.8-3%, whereas our results with MSKξMP only fluctuated within 0.1-0.2%.

Conclusions and Future Work
This work presents a new image textural descriptor, MSKξMP, for differentiating between ocular diseases such as AMD, Drusen, DME, and CNV in OCT images. The presented method can be used to encode textural patterns at local and regional scales and to improve edges in various directions while avoiding speckle noises. The computer simulations show that our measurements outperformed (a) the current state-of-the-art deep learning techniques and vision transformer; and (b) the FPN-EfficientNetB0 in dataset 3 and is comparable to the FPN-ResNet50 network. Some of our binary classifications from dataset 1 also performed well, achieving perfect accuracy, or close to it. MSKξMP had high specificity, suggesting high confidence in their true negatives. Future work should be focused on constructing 3D retinal images, such as 3D OCT images, by extracting volume and depth information to determine the spread of the disease. The 3D structures should be used for multi-view classification.
Author Contributions: A.L. and S.A., methodology; A.L., validation; A.L. and S.A., formal analysis, investigation; A.L. and S.A., data generation; A.L. and S.B., writing-original draft preparation; S.A. and S.B.; writing-review and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.